Robust Named Entity Extraction from Large Spoken Archives

نویسندگان

Benoît Favre

Frédéric Béchet

Pascal Nocera

چکیده

Traditional approaches to Information Extraction (IE) from speech input simply consist in applying text based methods to the output of an Automatic Speech Recognition (ASR) system. If it gives satisfaction with low Word Error Rate (WER) transcripts, we believe that a tighter integration of the IE and ASR modules can increase the IE performance in more difficult conditions. More specifically this paper focuses on the robust extraction of Named Entities from speech input where a temporal mismatch between training and test corpora occurs. We describe a Named Entity Recognition (NER) system, developed within the French Rich Broadcast News Transcription program ESTER, which is specifically optimized to process ASR transcripts and can be integrated into the search process of the ASR modules. Finally we show how some metadata information can be collected in order to adapt NER and ASR models to new conditions and how they can be used in a task of Named Entity indexation of spoken archives.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining broadcast news data: robust information extraction from word lattices

Fine-grained information extraction performance from spoken corpora is strongly correlated with the Word Error Rate (WER) of the automatic transcriptions processed. Despite the recent advances in Automatic Speech Recognition (ASR) methods, high WER transcriptions are common when dealing with unmatched conditions between the documents to process and those used to train the ASR models. Such misma...

متن کامل

Mining Broadcast News data: Robust Info Lattices

متن کامل

Beyond ASR 1-best: Using word confusion networks in spoken language understanding

We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses...

متن کامل

OOV Sensitive Named-Entity Recognition in Speech

Named Entity Recognition (NER), an information extraction task, is typically applied to spoken documents by cascading a large vocabulary continuous speech recognizer (LVCSR) and a named entity tagger. Recognizing named entities in automatically decoded speech is difficult since LVCSR errors can confuse the tagger. This is especially true of out-of-vocabulary (OOV) words, which are often named e...

متن کامل

Recognizing named entities in spoken Chinese dialogues with a character-level maximum entropy tagger

Named Entity Recognition (NER) is an important task in information extraction, where major attention has been paid to written texts of a news or academic paper (esp. biomedical) style. In this paper we report the first piece of work on NER in spoken Chinese dialogues, as a preliminary step for spoken language understanding. The NER task is taken as a sequential classification problem and solved...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Robust Named Entity Extraction from Large Spoken Archives

نویسندگان

چکیده

منابع مشابه

Mining broadcast news data: robust information extraction from word lattices

Mining Broadcast News data: Robust Info Lattices

Beyond ASR 1-best: Using word confusion networks in spoken language understanding

OOV Sensitive Named-Entity Recognition in Speech

Recognizing named entities in spoken Chinese dialogues with a character-level maximum entropy tagger

عنوان ژورنال:

اشتراک گذاری